The Significance of Tumor Microenvironment Score for Breast Cancer Patients

Purpose This study was designed to clarify the prognostic value of tumor microenvironment score and abnormal genomic alterations in TME for breast cancer patients. Method The TCGA-BRCA data were downloaded from TCGA and analyzed with R software. The results from analyses were further validated using the dataset from GSE96058, GSE124647, and GSE25066. Results After analyzing the TCGA data and verifying it with the GEO data, we developed a TMEscore model based on the TME infiltration pattern and validated it in 3273 breast cancer patients. The results suggested that our TMEscore model has high prognostic value. TME features with the TMEscore model can help to predict breast cancer patients' response to immunotherapy and provide new strategies for breast cancer treatment. Signature 24 was first found in breast cancer. In focal SCNAs, a total of 95 amplified genes and 169 deletion genes in the TMEscore high group were found to be significantly related to the prognosis of breast cancer patients, while 61 amplified genes and 174 deletion genes in the TMEscore low group were identified. LRRC48, CFAP69, and cg25726128 were first discovered and reported to be related to the survival of breast cancer patients. We identified specific mutation signatures that correlate with TMEscore and prognosis. Conclusion TMEscore model has high predictive value regarding prognosis and patients' response to immunotherapy. Signature 24 was first found in breast cancer. Specific mutation signatures that correlate with TMEscore and prognosis might be used for providing additional indicators for disease evaluation.

Through the analysis of immune cell interactions, genome, and transcriptome alterations in the breast cancer microenvironment, it also suggested that abnormal alterations in TME have important effects on the occurrence, development, metastasis, immunotherapy, and prognosis [23][24][25][26][27]. In order to better understand abnormal alterations in TME among breast cancer patients, we reanalyzed published datasets to deconvolve (CIBERSORT) and determine the TME profile of breast cancer samples [28]. Moreover, we correlated the TME phenotypes with genomic characteristics and pathologic features of breast cancer, thereby providing a new model for breast cancer prognosis.

Published Datasets.
Datasets GSE96058, GSE124647, and GSE25066 were downloaded from GEO (https://www .ncbi.nlm.nih.gov/geo/), while the data of TCGA-BRCA (RNA-seq, miRNA expression data, methylation, and clinical data) were downloaded from the UCSC Xena browser (https://xenabrowser.net/datapages/). Mutation data of TCGA-BRCA were downloaded by R TCGAbiolinks. The raw data related to Affymetrix were checked for background adjustment in the Affy software package by the RMA algorithm [29]. The raw data related to Illumina were processed by the Lumi software package.
After removing duplicate data, 3273 samples of GSE96058 with information of RNA-seq expression and clinical data were obtained and used for downstream analysis. Data of GSE124647 (140 samples) and GSE25066 (508 samples) were used for validation. Genome-Wide Human SNP 6.0 copy number segment data of TCGA-BRCA were downloaded from Firebrowse (http://firebrowse.org/) and processed using GISTIC 2.0. After removing duplicated data and samples without survival information, 1057 transcriptome samples were validated for TMEscore [10]. Among these 1057 samples, 1039 samples had CNV data, 1042 samples had SNP data, 1038 samples had miRNA data, and 760 samples had methylation data. The basic characteristics of the above 4 datasets are displayed in (Supplementary  Table 1).

Comprehensive Analysis for TME
2.2.1. Proportion of Infiltrating Cell Evaluation and TME Cluster Identification. The CIBERSORT algorithm and leukocyte signature matrix (LM22) gene signature were used for calculating immune cell infiltration of 22 human immune cell phenotypes [28,30]. CIBERSORT was considered as a deconvolution algorithm that used a set of gene expression values to calculate a minimal representation for each cell type [30]. Based on these values, support vector regression could be used to infer the cell type ratio for data from large tumor samples with mixed cell types [30]. We used standard annotation files to prepare gene expression profiles and uploaded the data to the CIBERSORT web portal (http://cibersort.stanford.edu/). Then, we used LM22 signatures and 1,000 permutations to run the algorithm. By applying the microenvironmental cell population counting method to evaluate the proportion of stromal cells, the absolute abundance of eight immune cells and two stromal cell populations in heterogeneous tissues from transcriptome data was quantitatively analyzed [10,30].

Unsupervised
Clustering to Identify TME Patterns and Tumor Sample Classification. From the immune cell proportion data analyzed by CIBERSORT [28,30], elbow (the error of the squares sum within the WSSE group, this method was used to find the optimal number of clusters by finding the "elbow point") ( Figure 1(a)) and gap statics (the point where k drops fastest the k value corresponding to the maximum gap value) were applied to evaluate the best number of the categories k value, and ConsensusClusterPlus R package was used for classification to obtain TME cluster (k-means, Euclidean, and ward.D) [31][32][33]. In order to obtain stable classification, the above procedure was repeated for 1000 times. Then, TME clusters were combined with survival data to clarify whether this classification was related to survival (Figures 1(b) and 1(c)).

TME Scoring according to the Differentially Expressed
Genes (DEG) among TME Clusters. Based on the above results, different TME clusters were mapped to the RNAseq data and screened for DEG in different samples by R limma package (P < 0:05 and jlog 2 FCj > log 2 (1.5)) [34]. The Benjamini-Hochberg correction was used to adjust P value for multiple testing [35]. After selecting categoryspecific differential genes, redundant genes were removed by random forest method [36]. Signature genes were obtained and functional enrichment analysis was applied to annotate the mainly enriched pathways. Genes were divided into two categories by the Cox regression model according to the coefficient values (positive or negative). Referring to gene expression grade index (GGI) score, TMEscore was calculated by the following formula [37]: where X is the positive expression value of the Cox coefficient responding to a gene set and Y is the expression value of the Cox coefficient involving gene set. The samples were then divided into TMEscore high and TMEscore low groups based on the median. Based on the above results, other data, such as TCGA and GEO, could be applied to verify this model. Then, TMEscore would be divided into TMEscore high and low groups according to the median, and the correlation between the two groups and prognosis would be analyzed. In the end, the correlation between the top 10 differentially expressed genes (DEG) and the prognosis was validated by online      Figure 1: The pattern classification of tumor microenvironment (TME). (a) Optimal number of clusters: K value calculated by the elbow method and gap statics algorithm. The ordinate axis represents total within sum of square; the abscissa axis represents the number of clusters K. (b) Consensus matrix heat map (K = 3): ConsensusClusterPlus was used for unsupervised class discovery (1000 iterations, k = 1 : 10). The optimal k value of 3 was determined using the elbow method and gap statics, combined with the correlation between the final classification and survival. (c) The distribution ratio of all kinds of immune cells in different TME clusters. (d) Clustering heat map of the distribution ratio of all kinds of immune cells in different TME clusters. (e) Survival analysis for different TME clusters: the red curve represents the TME cluster 1, the blue curve represents the TME cluster 2, and the yellow curve represents the TME cluster 3. The ordinate axis represents the probability of survival, and the abscissa axis represents the survival days. 5 BioMed Research International (https://software.broadinstitute.org/cancer/cga/absolute_ download). By leveraging copy number and mutation data, ABSOLUTE was able to estimate the purity and ploidy of samples.

Comprehensive Analysis
2.4.1. TME and Gene Expression Correlation Analysis. Gene expression profile (mRNA and miRNA) was used to identify genes specifically expressed in different subgroups. Later, functional enrichment analysis of specifically expressed genes was performed to elucidate the differences in the biological functions of different TME subgroups [20].

Prognostic
Evaluation of TME Cluster and TMEscore. The differentially expressed genes (DEG), miRNAs, and methylation sites were firstly identified between TMEscore subgroups (TMEscore high vs. low). After combining clinical data, survival analysis of these DEG, miRNAs, and methylation sites was further performed to determine whether they were related to clinical outcomes [34]. The molecular and clinical characteristics of different TME model subgroups were described through multidimensional data, and then, they were used for constructing the landscape map of the studied tumor.

2.4.3.
Exploring the Relationship between TMEscore and the Prognosis of Immune Checkpoint Inhibitor (ICI) Treatment. TMB is a feature that is known to be significantly related to immunotherapy and can be used to predict the efficacy of immunotherapy [38]. ROC was used to evaluate the predictability of TMB, TME group, and TMB+TME group regarding the effect of immunotherapy. The prognostic significance of the TME cluster and TMEscore was evaluated by the Kaplan-Meier curves and Cox proportional hazard regression models. The prognostic value of the TMEscore was further validated in several datasets with different biological or therapeutic background, including TCGA-BRCA, GSE124647 (metastatic breast cancer receiving endocrine therapy), and GSE25066 (patients receiving neoadjuvant taxane-anthracycline chemotherapy). TIDE (http://tide.dfci .harvard.edu/) was used for evaluating the clinical effects of ICI therapy whereby a higher tumor TIDE score was related to a poorer responsiveness to ICI and prognosis.

Results
3.1. The Breast Cancer TME Landscape. We used CIBER-SORT to estimate the abundances of 22 immune cell types (memory B cells, activated dendritic cells, M0 macrophages, etc.) in 3273 different breast cancer RNA-seq datasets (S Figure 1A) and correlated the immune cell profile with survival (S Figure 1B). The basic information of 22 kinds of immune cells in 3273 samples are provided in Supplementary table 1. Next, we used ConsensusClusterPlus for unsupervised class discovery (1000 iterations, k = 1 : 10). The optimal k value of 3 was determined using the elbow method and gap statics, combined with the correlation between the final classification and survival (Figures 1(a)-1(e) and Methods).
According to the above TME classification (k = 3; Figure 1(b)), we performed gene expression analysis using limma and identified 552 differentially expressed genes (DEGs) (P < 0:05, jlog 2 FCj > log 2 (1.5)). Unsupervised clustering then classified the differentially expressed genes into three groups (S Figure 2A). Functional enrichment analysis on 177 nonredundant genes using R ClusterProfiler revealed that this gene set was significantly enriched in immune-related pathways such as lymphocyte migration, lymphocyte chemotaxis, and leukocyte chemotaxis (S Figure 2B). A Cox regression model was used to determine the relationship between DEGs and the survival of samples. Next, genes were divided into 2 categories according to their coefficient values, and samples were divided into two groups based on calculated high or low TMEscores.
We found that the samples in the high TMEscore group had a good prognosis, while samples in the low TMEscore group had a poor prognosis (S Figure 2C). This finding indicated that clustering samples based on their immune cell profile combined with a TMEscore could predict the prognosis of breast cancer patients.
We next evaluated the TMEscore model using datasets representing metastatic breast cancer (TCGA-BRCA), breast cancer receiving endocrine therapy (GSE124647), and breast cancer receiving neoadjuvant taxane-anthracycline chemotherapy (GSE25066) (S Figure 3A-D). We found that the calculated TMEscore could effectively predict the prognosis in these samples. Namely, there is a significant negative correlation between TMEscore and mutational load of metastatic breast cancer samples (Spearman coefficient R = −0:44, P < 2:2 × 10 −16 ) (Figures 2(a) and 2(b)). TCGA-BRCA was grouped and evaluated according to luminal A, luminal B, basal, and Her-2. The TMEscores between different subtypes were significantly different. The TMEscores of basal and Her 2 were significantly lower than the others, and luminal B was the second highest, while luminal A was the highest (Figure 2(c)). The overall survival cure is displayed in (Figure 2         BioMed Research International we identified 96 mutation contexts and counted their frequencies in the BRCA tumor samples (S Figure 4A and 4B). In order to determine the relationship between the mutation frequency distribution of BRCA tumor samples and the signature included in COSMIC, we performed nonnegative matrix decomposition of the frequency matrix with 1042 samples in rows and 96 mutation types in columns. After extracting mutational characteristics of 3 somatic point mutations, the similarity between the extracted features and the signature collected by cosmic was analyzed. Through analysis, we found that the mutational spectrum of the high TMEscore group was mainly related to Signature 2, Signature 10, and Signature 30 (S Figure 4C), while the low TMEscore group was mainly related to Signature 2, Signature 3, and Signature 6 (S Figure 4D).

Analysis of Copy Number Variation (CNV).
Two sets of BRCA samples were analyzed by GISTIC software. In the high TMEscore group, 1q allelic amplification and 16q allelic deletion were the most significant alterations (Figure 4(a)), while 1q allelic amplification and 17p allelic deletion were the most significant alterations in the low TMEscore group.
Additionally, a total of 41 amplifications and 12 copy number deletions were detected among tumor samples in the high TMEscore group (Figure 4(b)). Among them, 11q13.3 was the most significant in the amplified region, while 11q23.1 was the most significant in the deletion region. In the low TMEscore group, 41 amplification and 24 deletions were found. The most significant amplification region was located at 8q24.21; and the most significant deletion region was located at 8p23.2 (Figure 4(c)).
We then applied Pancancer survival analysis of genes across 32 types of cancers in both high and low TMEscore groups (Methods http://starbase.sysu.edu.cn/ panGeneSurvivalExp.php). A total of 95 amplified genes and 169 deletion genes in the high TMEscore group and while 61 amplified genes and 174 deletion genes in the low TMEscore group were found to be significantly related to the prognosis of breast cancer patients (Tables 1 and 2; S Figure 5). In the high TMEscore group, the chromosomal locus with the highest number of amplified genes was located at 8q24.21 (n = 16; S Figure 5A1), and the highest number of deleted genes was located at 1p36.11 (n = 52; S Figure 5A2). However, in the low TMEscore group, the chromosomal locus with the highest number of amplified genes was found at 10p14 (n = 11, S Figure 5B1), and the highest number of deleted genes was found at 1p36.13 and 3p14.2 (n = 36; S Figure 5B2).
Based on the CNV information of each tumor sample, the tumor purity and ploidy was evaluated by ABSOLUTE software (Figure 4(e)). Tumor purity ranged from 0.16 to 1, and tumor cell genome ploidy ranged from 1.54 to 9.79, suggesting that genome disorder was common during tumorigenesis. Moreover, we noted a significant difference in the ploidy of the tumors in the high/low TMEscore groups (t-test, P = 2:337e − 08, and P = 0:03075; Figure 4(d)).

mRNA Differential Expression
Analysis. Gene expression analysis comparing samples with high and low TMEscores (adj:P < 0:05, jlog 2FCj > 1) followed by functional annotation (Methods) identified 782 differentially expressed genes (DEGs). Notably, DEGs were mainly upregulated in the samples with high TMEscores (Figures 5(b) and 5(c)). Most DEGs were enriched in the cell cycle, cell division, and other related pathways, indicating that the major biological differences between two sets of samples lie in cell proliferation (Figures 5(d) and 5(e)).

Analysis of Differences in the Expression of Methylation
Sites. In order to explore differences in the methylome of breast cancer samples, we downloaded TCGA-BRCA methylation chip data and analyzed 760 samples with significantly high or low TMEscores (adj:P < 0:05, absolute difference value > 0:15, Methods, Figure 6(a)). Identifying miRNAs, mRNAs, and methylation sites that were Red represents the high TMEscore group, and the other one represents the low TMEscore group. * represents statistical differences in frequency between the two groups. (b) Distribution of copy number amplification and deletion regions in high TMEscore group: 11q13.3 was the most significant in the amplified region, and 11q23.1 was the most significant in the deletion region. (c) Distribution of copy number amplification and deletion regions in TMEscore_low group: the most significant amplification region was located at 8q24.21, and the most significant deletion region was located at 8p23.2. (d) Ploidy analysis results in high and low TMEscore groups: * represents statistical differences in frequency between the two groups. (e) Purity analysis results in high and low TMEscore groups: * * represents statistical differences in frequency between the two groups.  13 BioMed Research International correlated with TMEscores allowed us to proceed to determine whether or not these factors were correlated with the survival. 217 significantly different methylation sites were detected.

Survival Analysis.
According to the expression values of the above differential miRNA, genes, methylation sites, or methylation levels (whether they were greater than the median of the expression value in all samples), the samples were divided into 2 groups. Log-rank test was used to determine whether these differential mRNA, miRNA, and methylation sites were related to the survival. A total of 6 miRNAs, 124 genes, and 67 methylation sites related to the survival (P < 0:05) were obtained. For example, we found that miRNA hsa-mir-1307, gene LRRC48, and methylation site cg25726128 were significantly correlated with the survival of BRCA patients (Figures 6(b)-6(d)).

The Correlation between TME and Immune Checkpoint
Inhibitor (ICI) Treatment. Next, we used TIDE (tumor immune dysfunction and exclusion, http://tide.dfci.harvard .edu/) to evaluate the clinical effects of immune checkpoint inhibitor (ICI) therapy in the two sets of BRCA samples. As shown in (Figure 6(e)), it could be seen that the TIDE score of the high TMEscore group was significantly higher than that of the low TMEscore group (wilcox.test, P value = 1.067e-12).
ROC was used to evaluate the predictive ability of TMB, TME group, TMB+TME group on the effect of immunotherapy (TIDE score was used as the immune efficacy score). The result revealed that TME group was better than TMB as a prognostic tool (roc.test, P value = 0.003749, Figure 6(f)). [39]. Therefore, combined with the TMEscore high samples with better prognosis in this analysis, the MSI was analyzed. According to the MSI score results predicted by TIDE, the samples were divided into MSI high and low groups (high: 146; low: 911). We found that the TMEscores responding to the two sets of MSI-high/low samples were significantly different (wilcox.test, P = 2:589e − 09) (Figure 6(g)).

Comprehensive Genome Landscape of Tumor Samples.
According to the TMEscore grouping information (TMEscore, TMEscore_group), mutations (purity, ploidy, and TMB), and clinical information (STAGE OS_STATUS) of the sample, the comprehensive genome landscape of the tumor sample was depicted, which is shown Figure 6(h).

Discussion
Breast cancer is a heterogeneous disease with highly variable clinical outcomes [40][41][42]. In the United States, although the incidence rate of breast cancer has slightly increased in recent years, the overall mortality rate has been in decline [43]. The overall decline in the mortality of breast cancer patients could be credited to committed and diverse studies in this field [6][7][8][40][41][42][43][44][45]. Recently, the concept of comprehensive immunotherapy for breast cancer had been gradually developed [45,46], and immune-related indicators have been demonstrated to have a significant impact on the prognosis of various tumors including breast cancer [19][20][21][22]. We therefore profiled the tumor immunological microenvironment and determined its predictive value for breast cancer patients.  It is reported that a large number of macrophage infiltration indicate a poor prognosis for cancer patients [47]. In vivo studies suggest that macrophages can stimulate angiogenesis, promote tumor cell extravasation, increase tumor cell migration, and further promote the occurrence and malignant progression [47]. Immunotherapy targeting macrophages may bring new hopes to cancer patients [48]. Therefore, clarifying the relationship between macrophages and the prognosis of cancer patients in clinical samples may have a significant role in promoting drug targeting macrophages. Through a comprehensive analysis of the infiltration status of 22 immune cells (S Figure 1A) and their correlation with survival (S Figure 1B), we found that resting Mast cell was a favorable factor for the OS, and M0 macrophage was a risk factor for the OS, which were rarely reported in breast cancer [40]. Further analysis also showed that the higher M0 macrophage infiltration was correlated to the worse prognosis of breast cancer patients (Figures 1(d) and 1(e)). Therefore, the M0 macrophage infiltration ratio may be used as an indicator to predict the prognosis of breast cancer patients in the future.
TMEscore, rarely reported in breast cancer, was considered to be related to the prognosis of cancer patients [10]. We therefore used a similar approach to evaluate the rela-tionship between TMEscore and the survival (S Figure 2A) [10]. Our analysis revealed that 177 genes of interest were significantly enriched in immune-related pathways such as lymphocyte migration, lymphocyte chemotaxis, and leukocyte chemotaxis (S Figure 2B). The Cox regression model was used to determine the relationship between those genes and the survival of samples. All genes were divided into 2 categories according to the coefficient value of them, and all samples were scored by TMEscore using the TMEscore calculation formula. We found that there was a positive correlation between TMEscore and the survival of patients (S Figure 2C), which was never been reported by others.
Further validation of TMEscore was conducted in different subgroups (S Figure 3). All analysis results indicated that the obtained TMEscore might be a good characterization of the prognosis of clinical samples regarding their survival (S Figure 3). The evaluation effect of TMEscore model also indicated that TMEscore was a very good prognostic indicator for breast cancer patients (Figure 2(a)). Our strata analysis found that TMEscores were different in different pathological subgroups (luminal A, luminal B, basal, and Her 2; Figure 2(c)). The subgroup analysis based on TMEscore Hsa-miR-934 Hsa-miR-301a-3p Hsa-miR-33a-3p Hsa-miR-18a-5p Hsa-miR-30a-3p Hsa-miR-19a-3p Hsa-miR-105-5p Hsa-miR-9-5p Hsa-miR-17-3p Hsa-miR-3200-3p 17 BioMed Research International also suggested that TMEscore was a much better indicator for evaluating the prognosis and survival of breast cancer patients than their pathological phenotypes (Figures 2(d)-2(f)). Furthermore, there was a significantly negative correlation between TMEscore and TMB ( Figure 2(b); P < 2:2 × 10 −16 ). Although the relationship between TMEscore and the prognosis has been reported in other malignant tumors [10,49], this is the first report in breast cancer. These findings lay the foundation for the clinical application of TMEscore in the future.

20
BioMed Research International Through mutation analysis, we found that missense mutations were the most common alterations in BRCA [50,51], and C>T was the most common type of SNP mutation in both high TMEscore and low subgroup (S Figure 4A and S Figure 4B). Similarities between the mutation characteristics of the TMEscore groups and those of the COSMIC mutation signature were analyzed and provided (S Figure 4C and S Figure 4D). Through analysis, we found that the mutation profile of the high TMEscore group was mainly related to Signatures 2, 10, 30, 24, and 1 (S Figure 4C), while the low TMEscore group was mainly  The survival analysis results of cg25726128 in different subgroups: the abscissa axis represents the survival time, and the ordinate axis represents the survival probability. The red curve represents high expression, and the blue curve represents low expression. (e) Immunotherapy efficacy score calculated by TMEscore group: the abscissa axis represents the TMEscore subgroup, and the ordinate axis represents TIDE. * * represents that the analysis result is statistically significant. (f) Using ROC analysis to evaluate the predictive ability of TMB, TMEgroup, and TMB+TME group on the effect of immunotherapy. (g) The relationship between MSI and TMEscore: the abscissa axis represents different MSI subgroups, and the ordinate axis represents TMEscore. * * represents that the analysis result is statistically significant. (h) Comprehensive genome landscape of BRCA (47 survival-related genes, P < 0:01).

BioMed Research International
genes, MUC4 was the gene with the highest mutation rate (24%; Figure 3 A6), which was reported to be uncorrelated with prognosis [59,60].
Copy number alteration was a driving factor for the occurrence and development of cancer, and it played an important role in the entire developmental process of cancer [52,61]. Consistent with earlier reports, the most prevalent somatic copy number alteration (SCNA) in our study was either very short (focal) or arm-level [61]. Through analysis, we found that 1q amplification was the most common armlevel SCNAs, regardless of TMEscore (Figure 4(a)). Incidentally, chromosome 1q21.3 amplification was considered to be a trackable biomarker and actionable target for breast cancer recurrence [62]. 16q deletion was the most common arm-level SCNA among all samples (Figure 4(a)), while 17p deletion is the most frequently occurring SCNA in the low TMEscore subgroup. Moreover, we found that in most cases, the incidence frequency of deletion or amplification in the same chromosome region seems to be significantly higher in the low TMEscore subgroup than that of the high TMEscore subgroup, which was rarely reported in other studies (Figure 4(a)) [52,[61][62][63]. Similar status could also be seen in focal SCNAs (Figures 4(b) and 4(c)). In other words, genome amplification or deletion alterations are much more likely to be found in the low TMEscore subgroup (Figures 4(a)-4(c)).
83 amplification and 36 deletion loci belonging to SCNA regions were found, and most of them have been reported in other studies [64]. However, the relationship between genes in these regions and the prognosis of breast cancer patients has not been systematically reported [61][62][63][64]. Pancancer survival analysis of genes across breast cancer in focal SCNAs was conducted and summarized (Tables 1 and 2 and S Figure 5). A total of 95 amplified genes and 169 deletion genes in the high TMEscore group were found to be significantly related to the prognosis of breast cancer (Table 1; S Figure 5), while 61 amplified genes and 174 deletion genes in the low TMEscore group were identified (Table 2; S Figure 5). In the high TMEscore group, the chromosomal loci with the highest frequency of amplified genes was located at 8q24.21 (n = 16; S Figure 5A1), and the highest frequency of deleted genes was located at 1p36.11 (n = 52; S Figure 5A2), while in the low TMEscore group, the chromosomal loci with the highest frequency of amplified genes were found at 10p14 (n = 11, S Figure 5B1), and the highest frequency of deleted genes was found at 1p36.13 and 3p14.2 (n = 36; S Figure 5B2). The above results in SCNAs regions were all discovered and reported for the first time. Our results would provide a framework for clinical work, especially for the comprehensive evaluation of the prognosis for breast cancer patients.
Through comprehensive analysis and evaluation, abnormal changes in tumor purity and ploidy, which were reported in other cancers [65][66][67], were also detected in breast cancer. This discovery suggested that genome disorder was a common phenomenon during carcinogenesis, but the correlation between them and breast cancer remains to be elucidated (Figures 4(d) and 4(e)).
As shown in (Figure 6(e)), it could be seen that the TIDEscore of the high TMEscore group was significantly higher than that of the low TMEscore group (P value = 1.067e-12). The higher tumor TIDE prediction score was related to the poorer response to immune checkpoint inhibitor therapy. In the previous analysis, the high TMEscore group has a better prognosis, and the low TMEscore group has a poor prognosis. Corresponding to the results of this TIDE assessment, it showed that the high TMEscore group had a good prognosis, but the low TMEscore group had a better response on immune checkpoint inhibitor (ICI) therapy. ROC was used to evaluate the predictive ability of TMB [38], TME group, and TMB+TME group on the effect of immunotherapy (Figure 6(f)). We found that TME was better than TMB as a prognostic tool (roc.test, P value = 0.003749). TMEscores responding to the two sets of MSI were significantly different (wilcox.test, P = 2:589e − 09) (Figure 6(g)). Negative correlation between TMEscore and TMB is displayed in Figure 2(b) (P < 2:2 × 10 −16 ). Although significant correlation between TMB and MSI was reported in colorectal and pancreatic cancer [74], the relationship between TMEscore and MSI in BRCA still needed to be validated. But this would not weaken the potential of TMEscore as an indicator for the prognostic evaluation in breast cancer 22 BioMed Research International patients ( Figure 6(f)). It could be found that genomic alterations in those two TMEscore groups were significantly different from each other (Figure 6(h)). In a word, it showed that the TME method could be effectively used to predict tumor prognosis and the responsiveness to ICI therapy. Compared with traditional scoring methods, the advantage of the TMEscore scoring method is that more cells are included in the score [10,38]. Twenty-two kinds of immune cells are included in this project [10]. We first clustered them according to the immune infiltration situation and then screened the genes related to infiltration. In this step, the candidate gene set was expanded to ensure the accuracy. Finally, the gene expression grade index (GGI) score was used to effectively simplify the model and facilitate calculation [37]. Compared with GGI, TMEscore has fewer requirements for the number of samples, and the calculation process is much simpler.

Conclusion
TMEscore could be effectively used to predict tumor prognosis and the efficacy of ICI. Signature 24 was first found in breast cancer. In focal SCNAs, a total of 95 amplified genes and 169 deletion genes in the high TMEscore group are found to be significantly related to the prognosis of breast cancer patients, while 61 amplified genes and 174 deletion genes in the low TMEscore group were found. LRRC48, CFAP69, and cg25726128 were first discovered and reported to be related to the survival of breast cancer patients.

TME:
Tumor microenvironment iTME: Immune tumor microenvironment LM22: Leukocyte signature matrix 22 DEG: Differentially expressed genes UCSC: University of California, Santa Cruz TCGA: The Cancer Genome Atlas GEO: Gene Expression Omnibus CNV: Copy number variation GDC: Genomic Data Commons CDF: Cumulative distribution function SNP: Single nucleotide polymorphisms GISTIC: Genomic Identification of Significant Targets in Cancer TIDE: Tumor immune dysfunction and exclusion TMB: Tumor mutation burden MCRs: Minimum common regions SCNAs: Somatic copy number alterations ICI: Immune checkpoint inhibitor GGI: Gene expression grade index.

Data Availability
All data involved in this study can be downloaded from the websites mentioned in the above or obtained free of charge by contacting the corresponding author. (A) Consistent clustering results of differential genes: Con-sensusClusterPlus was used for unsupervised class discovery (1000 iterations, k = 1 : 10). The optimal k value of 3 was determined using the elbow method and gap statics, combined with the correlation between the final classification and survival. The limma package of R was used to screen different types of differentially expressed genes (P < 0:05, jlog 2FCj > log 2 (1.5)). 522 differentially expressed genes were obtained, and unsupervised cluster analysis was performed, and the samples were divided into three categories. (B) Functional enrichment analysis results of signature genes: functional enrichment analysis on 177 nonredundant genes using R ClusterProfiler revealed that this gene set was significantly enriched in immune-related pathways such as lymphocyte migration, lymphocyte chemotaxis, and leukocyte chemotaxis. (C) Survival analysis results of all enrolled samples based on TMEscore: a Cox regression model was used to determine the relationship between DEGs and the survival of samples. Next, genes were divided into 2 categories according to their coefficient values, and samples were divided into two groups based on high or low calculated TMEscores. Supplementary Figure 3: validation and model evaluation using TCGA-BRCA, GSE124647, and GSE25066. (A) Survival analysis for TCGA-BRCA based on TMEscore model: the red curve represents the high TMEscore group, and the blue curve represents the low TMEscore group; the ordinate axis represents the probability of survival, and the abscissa axis represents the survival days.
(B) Survival analysis for GSE124647 (metastatic breast cancer) based on the TMEscore model: the red curve represents the high TMEscore group, and the blue curve represents the low TMEscore group; the ordinate axis represents the probability of survival, and the abscissa axis represents the survival days. (C1) Survival analysis for GSE25066 (chemotherapy) based on the TMEscore model: the yellow curve represents the high TMEscore group, and the blue curve represents the low TMEscore group; the ordinate axis represents the probability of survival, and the abscissa axis represents the survival days. (C2) ROC curve of GSE25066 (AUC = 0:612): the ordinate axis represents the detection sensitivity of the ROC curve, and the abscissa axis represents the detection specificity of the ROC curve. (D1) Survival analysis for GSE124647 (endocrine therapy) based on the TMEscore model: the yellow curve represents the high TMEscore group, and the blue curve represents the low TMEscore group; the ordinate axis represents the probability of survival, and the abscissa axis represents the survival days.
(D2) ROC curve of GSE124647 (AUC = 0:557): the ordinate axis represents the detection sensitivity of the ROC curve, and the abscissa axis represents the detection specificity of the ROC curve. Supplementary Figure 4: the results of mutation spectrum analysis: nonnegative matrix decomposition of the frequency matrix with 1,042 samples in rows and 96 mutation types in the columns were performed, mutational characteristics of 3 somatic point mutations were extracted, and then, the similarity between the extracted features and the signature collected by cosmic was analyzed. (A) The frequency distribution of 96 mutation types in high TMEscore group: the most frequent type of mutation is C>T. (B) The frequency distribution of 96 mutation types in low TMEscore group: the most frequent type of mutation is C>T.
(C) Similarity between the mutation characteristics of the high TMEscore group and those of the cosmic mutation signature: the mutation spectrum was mainly related to Signature 2, Signature 10, and Signature 30. (D) Similarity between the mutation characteristics of the low TMEscore group and those of the cosmic mutation signature: the mutation spectrum was mainly related to Signature 2, Signature 3, and Signature 6. Supplementary Figure 5: pancancer survival analysis of genes across cancers in SCNAs regions. The abscissa axis represents the chromosome locus, and the ordinate axis represents the number of genes locating in the corresponding chromosome locus. (A1) In the high TMEscore group, the distribution and number of amplified genes that are significantly related to the survival of breast cancer patients in different chromosomes. (A2) In the high TMEscore group, the distribution and number of deletion genes that are significantly related to the survival of breast cancer patients in different chromosomes. (B1) In the low TMEscore group, the distribution and number of amplified genes that are significantly related to the survival of breast cancer patients in different chromosomes. (B2) In the low TMEscore group, the distribution and number of deletion genes that are significantly related to the survival of breast cancer patients in different chromosomes. (Supplementary  Materials)